Using N-grams to Process Hindi Queries with Transliteration Variations
نویسندگان
چکیده
Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a collection of Hindi songs. Within this retrieval system, we study the effect of varying N on retrievability. Additionally, we present an alternative spell-checking tool based on Ngrams. We conclude with a discussion of the number of N-grams produced by different values of N for different languages and a discussion of the choice of N.
منابع مشابه
Transliterated Search using Syllabification Approach
Machine transliteration refers to the process of automatic conversion of a word from one language to another without losing its phonological characteristics. In this work, we present our experiments performed in subtask-1 and subtask-2 as a part of the FIRE-2013 transliterated search task. In both the subtasks, the transliteration from Roman script to Devanagari script was performed using sylla...
متن کاملUrdu Hindi Machine Transliteration using SMT
Transliteration is a process of transcribing a word of the source language into the target language such that when the native speaker of the target language pronounces it, it sounds as the native pronunciation of the source word. Statistical techniques have brought significant advances and have made real progress in various fields of Natural Language Processing (NLP). In this paper, we have ana...
متن کاملHindi to Punjabi Transliteration using Phonetic and Orthographic Rules
One of the important applications of Natural Language Processing is machine translation. Machine transliteration is an emerging and a very important research area in the field of machine translation. Translation systems translate message from source language to target language, keeping the exact meaning. While the transliteration system finds the same meaning word/sentence in another language, ...
متن کاملHybrid Approach for Hindi to English Transliteration System for Proper Nouns
s Abstract— In this paper hybrid approach is presented to transliterate proper nouns written in Hindi language into its equivalent English language. Hybrid approach means combination of direct mapping, rule based approach and statistical machine translation approach. Transliteration is a process to generate the words from the source language to the target language. The reverse process is known ...
متن کاملIIIT Hyderabad’s CLIR experiments for FIRE-2008
This paper discourses our CLIR experiments performed for the FIRE workshop. We had submitted our runs for Adhoc monolingual document retrieval in Hindi and English, and Ad-hoc cross-lingual document retrieval from Hindi to English, and English to Hindi. In this paper, we describe our English to Hindi and Hindi to English CLIR systems and the experiments conducted on them using the FIRE2008 data...
متن کامل